Morphology, Syntax, and What’s in Between: Reifying Cross-linguistic Variation through Cross-linguistic Parsing

نویسنده

  • REUT TSARFATY
چکیده

A central part of NLP research is devoted to the development of parsing systems which analyze the way words combine to form phrases and sentences. This analysis is considered the first step towards natural language understanding, or, as it is often put, extracting “who did what to whom”. Such information is crucial for computer programs that perform tasks such as information retrieval, question answering, and machine translation, to name just a few. To date, parsing systems that were developed and applied for English show excellent performance, but the application of the same models to languages such as German, Czech, Arabic, Hebrew, Turkish, Hindi, and more, doesn’t necessarily yield comparable results. This discrepancy gives rise to a fascinating challenge, namely: Is it possible to devise a single parsing system which is abstract enough to accomodate different languages, and yet specific enough to learn from data the structure and properties of particular languages? Addressing this question is not only technologically challenging, but is also of utter importance from a scientific point of view. The number of languages in the world is estimated at around 4000-6000 and yet it is striking to realize how similar the different languages are in the principles underlying their organization. In order to formally refer to the commonalities and differences between languages the linguist Noam Chomsky used the term universal grammar (UG), referring to a set of principles which are shared by all languages, and parameters settings that differentiate languages from one another. Modern typological studies following the work of (Sapir 1921, Greenberg 1963) and others play an important part in unraveling the different dimensions of variation between languages (the ’principles’) and instantiating their possible values (the parameter ‘settings’). Typology tells us, for instance, that all languages have nouns and verbs, and that the relation between a noun and a verb can be of a subject or of an object type. This is the functional dimension of language description. Languages vary, however, in how they express these concepts. For instance, subjects in English tend to appear before the verb, but in Warlpiri, subjects can appear almost anywhere in the sentence. This is the basic word-order (or, structural) dimension of language variation (Greenberg 1963). Languages also vary in how much information is expressed within a single word. In English, for instance, the subject, verb and object of a sentence are expressed in different words. In Arabic, Hebrew, or Turkish, the subject, verb and object may appear within a single word. This is the morphological dimension of variation (Sapir 1921).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-linguistic Influence at Syntax-pragmatics Interface: A Case of OPC in Persian

Recent research in the area of Second Language Acquisition has proposed that bilinguals and L2 learners show syntactic indeterminacy when syntactic properties interface with other cognitive domains. Most of the research in this area has focused on the pragmatic use of syntactic properties while the investigation of compliance with a grammatical rule at syntax-related interfaces has not received...

متن کامل

Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon

Universal Dependencies is an initiative to develop cross-linguistically consistent grammatical annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning and parsing research from a language typology perspective. It assumes a dependency-based approach to syntax and a lexicalist approach to morphology, which together entail that the funda...

متن کامل

Cross-Linguistic Transfer Revisited: The Case of English and Persian

The present study sought to investigate the evidence for cross-linguistic transfer in a partial English immersion and non-immersion educational setting. To this end, a total of 145 first, third and fifth graders in a partial English immersion program and 95 students from the same grade levels in a non-immersion program were chosen. Six different English and Persian tests were administered: the ...

متن کامل

A Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles

Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...

متن کامل

Code-Copying in the Balochi Language of Sistan

This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011